Probability Weighted Ensemble Transfer Learning for Predicting Interactions between HIV-1 and Human Proteins
نویسنده
چکیده
Reconstruction of host-pathogen protein interaction networks is of great significance to reveal the underlying microbic pathogenesis. However, the current experimentally-derived networks are generally small and should be augmented by computational methods for less-biased biological inference. From the point of view of computational modelling, data scarcity, data unavailability and negative data sampling are the three major problems for host-pathogen protein interaction networks reconstruction. In this work, we are motivated to address the three concerns and propose a probability weighted ensemble transfer learning model for HIV-human protein interaction prediction (PWEN-TLM), where support vector machine (SVM) is adopted as the individual classifier of the ensemble model. In the model, data scarcity and data unavailability are tackled by homolog knowledge transfer. The importance of homolog knowledge is measured by the ROC-AUC metric of the individual classifiers, whose outputs are probability weighted to yield the final decision. In addition, we further validate the assumption that only the homolog knowledge is sufficient to train a satisfactory model for host-pathogen protein interaction prediction. Thus the model is more robust against data unavailability with less demanding data constraint. As regards with negative data construction, experiments show that exclusiveness of subcellular co-localized proteins is unbiased and more reliable than random sampling. Last, we conduct analysis of overlapped predictions between our model and the existing models, and apply the model to novel host-pathogen PPIs recognition for further biological research.
منابع مشابه
Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins
MOTIVATION Protein-protein interactions (PPIs) are critical for virtually every biological function. Recently, researchers suggested to use supervised learning for the task of classifying pairs of proteins as interacting or not. However, its performance is largely restricted by the availability of truly interacting proteins (labeled). Meanwhile, there exists a considerable amount of protein pai...
متن کاملA Combination Method of Centrality Measures and Biological Properties to Improve Detection of Protein Complexes in Weighted PPI Networks
Introduction: In protein-protein interaction networks (PPINs), a complex is a group of proteins that allows a biological process to take place. The correct identification of complexes can help better understanding of the function of cells used for therapeutic purposes, such as drug discoveries. One of the common methods for identifying complexes in the PPINs is clustering, but this study aimed ...
متن کاملCombination of Ensemble Data Mining Methods for Detecting Credit Card Fraud Transactions
As we know, credit cards speed up and make life easier for all citizens and bank customers. They can use it anytime and anyplace according to their personal needs, instantly and quickly and without hassle, without worrying about carrying a lot of cash and more security than having liquidity. Together, these factors make credit cards one of the most popular forms of online banking. This has led ...
متن کاملA Combination Method of Centrality Measures and Biological Properties to Improve Detection of Protein Complexes in Weighted PPI Networks
Introduction: In protein-protein interaction networks (PPINs), a complex is a group of proteins that allows a biological process to take place. The correct identification of complexes can help better understanding of the function of cells used for therapeutic purposes, such as drug discoveries. One of the common methods for identifying complexes in the PPINs is clustering, but this study aimed ...
متن کاملA Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions
Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been...
متن کامل